The Analysis of a Probabilistic Approach to Nearest Neighbor Searching
نویسندگان
چکیده
Given a set S of n data points in some metric space. Given a query point q in this space, a nearest neighbor query asks for the nearest point of S to q. Throughout we will assume that the space is real d-dimensional space <d, and the metric is Euclidean distance. The goal is to preprocess S into a data structure so that such queries can be answered efficiently. Nearest neighbor searching has applications in many areas, including data mining [7], pattern classification [5], data compression [10]. Because many applications involve large data sets, we are interested in data structures that use linear storage space. Naively, nearest neighbor queries can be answered in O(dn) time through brute-force search. Although nearest neighbor searching can be performed efficiently in lowdimension spaces, for all known exact linear-space data structures, search times grow exponentially as a function of dimension. Thus for reasonably large dimensions, brute-force search is often the most efficient in practice. One approach to reducing the search time is through approximate nearest neighbor search. A number of data structures for approximate nearest neighbor searching have been proposed [1, 3, 11]. The phenomenon of concentration of distance would suggest that approximate nearest neighbor searching is meaningless. Fortunately, the distributions that arise in applications tend to be clustered in lower dimensional subspaces [6]. Good search algorithms take advantage of this low-dimensional clustering. The fundamental problem that motivates this work is the lack of predictability in existing practical approaches to nearest neighbor searching. In high dimensions, exact search is no better than brute-force, and approximate search algorithms are acceptably fast only when the allowed
منابع مشابه
Non-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملEFFECT OF THE NEXT-NEAREST NEIGHBOR INTERACTION ON THE ORDER-DISORDER PHASE TRANSITION
In this work, one and two-dimensional lattices are studied theoretically by a statistical mechanical approach. The nearest and next-nearest neighbor interactions are both taken into account, and the approximate thermodynamic properties of the lattices are calculated. The results of our calculations show that: (1) even though the next-nearest neighbor interaction may have an insignificant ef...
متن کاملProbably correct k-nearest neighbor search in high dimensions
A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We ...
متن کامل